MSR KMG at TREC 2014 KBA Track Vital Filtering Task

نویسندگان

  • Jingtian Jiang
  • Chin-Yew Lin
  • Yong Rui
چکیده

In this paper, we present our strategy for TREC 2014 KBA track Vital Filtering task. This task is also known as "Cumulative Citation Recommendation" or "CCR" in 2012 and 2013. Vital Filtering task is to identify "vital" documents containing timely and new information that should be used to update the profile of a given entity (also called a topic). Our strategy for vital filtering is to first retrieve as many relevant documents as possible and then apply classification and ranking methods to differentiate vital documents from non-vital documents. We first index the corpus and retrieve candidate documents by combining entity names and their redirect names as phrase queries. We then learn to rank documents by leveraging four types of feature: 1) time range: the earlier documents get a higher score than the later documents, 2) temporal feature: burst of entity mentions, 3) title/profession feature: the title and profession information around an entity mention, and 4) action pattern: the entity name and its associated verb in the sentence mentioning the entity. A simple global adjustment is applied at the end to further improve system performance. Our experiment results confirm that these features are very effective, especially for action pattern and time range. The system incorporating all the proposed features significantly outperforms the phrase query baseline. Categories & Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – Information Filtering; H.3.m [Information Storage and Retrieval]: Miscellaneous – Test Collections; I.2.7 [Natural Language Processing] Text analysis – Language parsing and understanding

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WHU at TREC KBA Vital Filtering Track 2014

This paper describes the WHU IRLAB participation to the Vital Filtering task of the TREC 2014 Knowledge Base Acceleration Track. In this task, we implemented a system to detect vital documents that could be used for a human editor to update or create the profile of an entity. Our approach is to view the problem as a classification problem and use Stanford NLP Toolkit to extract necessary inform...

متن کامل

BIT and Purdue at TREC-KBA-CCR Track 2014

This report summarizes our participation at KBA-CCR track in TREC 2014. Our submissions are generated in two steps: (1) Filtering a candidate documents collection from the stream corpus for a set of target entities; and (2) Estimating the relevance levels between candidate documents and target entities. Three kinds of approaches are employed in the second step, including query expansion, classi...

متن کامل

Evaluating Stream Filtering for Entity Profile Updates in TREC 2012, 2013, and 2014

The Knowledge Base Acceleration (KBA) track ran in TREC 2012, 2013, and 2014 as an entitycentric filtering evaluation. This track evaluates systems that filter a time-ordered corpus for documents and slot fills that would change an entity profile in a predefined list of entities. Compared with the 2012 and 2013 evaluations, the 2014 evaluation introduced several refinements, including high-qual...

متن کامل

IRIT at TREC KBA 2014

This paper describes the IRIT lab participation to the Vital Filtering task (also known as Cumulative Citation Recommendation) of the TREC 2014 Knowledge Base Acceleration Track. This task aims at identifying vital documents containing timely new information that should help a human to update the profile of the target entity (e.g., Wikipedia page of the entity). In this work, we evaluate two fa...

متن کامل

Filtering Documents over Time on Evolving Topics - The University of Amsterdam at TREC 2013 KBA CCR

In this paper we describe the University of Amsterdam’s approach to the TREC 2013 Knowledge Base Acceleration (KBA) Cumulative Citation Recommendation (CCR) track. The task is to filter a stream of documents for documents relevant to a given set of entities. We model the task as a multi-class classification task. Entities may evolve over time and the classifier should be able to adapt to these ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015